Personalized query completion suggestion

ABSTRACT

Search query processing includes receiving a search query input string from a user of a mobile device and comparing the search query input to a personalized dictionary of the mobile device, determining a suggested completion for each match in the comparison, and providing the suggested completion to the user for selection. The user can select the suggested modification, if desired, and the completed query can be submitted to a search facility. Personalized dictionaries can be generated from analysis of previously submitted search queries. The analyzed search queries may have been submitted by the user, as well as by other users with similar interests. The analysis of search queries may categorize users into clusters or groups of persons having similar interests.

BACKGROUND

Search applications, such as Internet search engines and mobilenavigation systems, receive user input that includes search terms fordesired information on locales or destinations and the like. The searchapplications return search results that are responsive to the searchterms and that should contain the desired information. The moredesirable search applications assist users in finding the informationthey seek more accurately and more quickly. Providing a response morequickly is especially important in the mobile environment, where theuser input interface is often relatively small, cumbersome, and slow,and where users are often engaged in multiple tasks that compete forattention.

One technique for assisting the user input process is to providesuggested input completion. For example, many mobile navigation systemsoffer suggested city name and street name completions as a user typesthe input letters. The system suggestions are based on the receivedletters, which comprise a text string that can be compared against adatabase of valid city names and street names. The most likely validnames are offered to the user as suggested completions for the inputstring. The user is free to accept a suggested completion, or tocontinue with providing text input letter-by-letter until the usercompletes the input string. As the user types additional letters, thesystem may continue to offer name completion suggestions until the usersubmits an input string.

Query completion is also known for Internet search engines in the mobilecontext. As a user types letters comprising search terms into an inputdisplay box, the mobile search engine application may offer suggestedterms (letters and/or words) that would be valid completions to thesearch query. For example, if a mobile user begins typing into a searchengine input box and completes the text string “resta . . . ”, thesearch engine may select the word “restaurant” from its database andoffer the word to the user as a suggested completion for the query text.If the user selects the suggested completion, the search engine will usethe accepted completion text in the search. If the user wants to keeptyping the input, the user can do so. In either case, the search enginewill operate on the search input phrase submitted by the user.

The database relied upon for providing suggested search terms istypically a list of several hundred or perhaps one thousand popularterms. The list of popular terms comprises a query lexicon or dictionarythat the search engine can use to compare against already-receivedletters and terms in a query input to identify the most likely searchquery completion. The query dictionary is typically downloaded from thesearch engine server and stored at the user's mobile device, rather thankept only at the search engine server. Downloads can occur periodicallyaccording to a schedule or can occur whenever a change to the dictionaryis available, or both. Such local storage of the completion databaseprovides faster response time and reduced bandwidth requirements ascompared with server-based dictionary storage.

Unfortunately, users may often provide search terms that are not foundamong the popular terms in standard dictionaries, and therefore thesearch engine system cannot find meaningful completion suggestions fromthe database. That is, the completion suggestions are more likely to berejected by the user. As a result, the user must manually complete thesearch input string. Search terms that are not found in the local searchdictionary represent lost opportunities for assisting the user with morequickly completing the query and receiving useful responses. Inaddition, when the acceptance rate of completion suggestions is reducedbecause no meaningful completions are offered, some users may come tosuspect that a carrier or search provider is skewing the search termsoffered or is attempting to influence the submitted searches for theirown purposes, rather than to assist the user. Such suspicions lead touser dissatisfaction, much in the way that non-targeted advertising canhave a negative impact on consumers.

Although more valid completion suggestions can be obtained with a largersearch dictionary that would have more terms available for suggestions,the increased dictionary size would tax the local data storagecapacities of most mobile devices. Thus, increasing the local dictionarysize sufficiently to include most of the terms a user might input wouldbe impractical. The search engine server has significant storagecapacity for large dictionaries and could be the repository for a largermobile completion suggestion database, but forcing mobile devices toobtain completion suggestion from the server would significantlyincrease the response time, defeating the purpose of completionsuggestion, and would consume too much system bandwidth. Thus,server-based completion suggestion would be an unsatisfactory userexperience and would be inefficient.

It should be apparent that more efficient schemes for providing searchengine query input for mobile devices are desired. The present inventionsatisfies this need.

SUMMARY

In accordance with embodiments described herein, suggestions forcompletion of an input query as a user provides query input are based ona dictionary that is specially selected for the user. The dictionary onwhich completion suggestions are based comprises a personalizeddictionary that is specially selected for the user, and in that way thequery completion suggestions are personalized for the user. When eachquery completion suggestion is offered to a user, the suggestioncomprises a suggested modification to the input string received thus farfrom the user, based on comparison of the input string to entries in thepersonalized dictionary. If the user accepts the offered completionsuggestion, then the input query string is modified in accordance withthe completion suggestion. The completion suggestions can be providedfor an input string already completed or in the process of being input.The personalized query completion suggestion is more likely to providecompletion suggestions that are appropriate for the user as comparedwith systems that utilize the same dictionary for all users. That is,the query completion suggestion from the personalized dictionary is morelikely to be what was intended by the user and will more likely bewelcomed and trusted by the user. This increases the efficiency of theuser input process and improves the user experience. In this way, useracceptance rates of completion suggestions are increased and resourceutilization is not compromised.

Embodiments described herein support search query processing thatincludes receiving a search query input string from a user of a mobiledevice and comparing the search query input to a personalized dictionaryand determining a suggested completion for each match in the comparison,and then providing the suggested completion to the user for selection.The input string can be received from the user via, for example, thedevice keyboard. The received input string can comprise a single letteror a group of letters or a phrase. Likewise, the modification may be asingle letter or a group of letters or a phrase. The user can thenselect the suggested input query completion or modification, if desired.The completed query can be submitted to a search facility for return ofsearch results.

Embodiments provide personalized dictionaries that can be generated fromanalysis of previously submitted search queries. The analyzed searchqueries may have been submitted by a user population comprising thetarget user (the user for whom the personalized dictionary is intended),as well as by other users or exclusively by other user populations.Typically, a target user who is new to the search system or service willbe provided with a personalized dictionary constructed from user queriesby a user population with similar interests to those of the target user.The analysis of search queries may determine clusters or groups havingsimilar interests. The clusters or groups may be identified based oninformation such as user interests or characteristics, and oninformation about the search queries themselves, such as search terms,or based on combinations of such information and categories. Targetusers can be placed or assigned into more than one cluster or group. Thepersonalized dictionary provided to a target user will be fashionedaccording to the groups with which the target user has been identified.Thus, a target user's personalized dictionary may be a combination ofmultiple group-specific dictionaries. The personalized dictionary can bestored at the target user's mobile device for efficient utilization ofresources.

Other features and advantages of the present invention should beapparent from the following description of exemplary embodiments, whichillustrate, by way of example, aspects of the invention.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is an illustration of a mobile device display showing a queryinput and suggested completion in accordance with the description.

FIG. 2 is an illustration of displays for two mobile devices, eachdisplay showing a query input and suggested completion for the tworespective different users.

FIG. 3 is a block diagram of a mobile device communication system thatoperates in accordance with the description.

FIG. 4 is a block diagram of the mobile device illustrated in FIG. 3.

FIG. 5 is a flow chart that illustrates processing to generate apersonalized dictionary in accordance with the description.

FIG. 6 is a block diagram of a search service provider that generatesthe personalized dictionaries in accordance with the description.

DETAILED DESCRIPTION

Embodiments described herein help users find the information they wantfaster with the help of a personalized query completion suggestionfeature in conjunction with search engines. Using a lexicon that istailored for a user, rather than using the same lexicon for all users,more quickly provides a query completion suggestion, and provides asuggestion that is more likely the completion intended by the user asthe complete query. References in this description to “Medio Systems”shall be understood to be references to the assignee of the presentinvention, Medio Systems, Inc. of Seattle, Wash., USA. Numbered headingsin this description are provided for the convenience of the reader.

1.0 Summary

Aiding users to more quickly find the information they seek is animportant goal of a successful search application. This is especiallytrue in a mobile environment where input mechanisms are typicallycumbersome and slow. For example, most users in a mobile environment aretyping on a relatively small keyboard and may be engaged in multipletasks so that typing on a keyboard can only be given intermittentattention. Such users highly value assistance with their input ofcommands and search terms.

Conventional query suggestion completion systems are not personalized tothe preferences of each user. Rather, a single list of 200-1000 popularitems is created and the same list is then passed to every system user.When the user starts typing a query, a list of one or more of the mostprobable completion(s) are shown to the user and the user has the optionto select a completion suggestion. But user preferences are diverse andwhat would be popular in one group of people would not necessarily bepopular in another group. This one-size-fits-all approach can result inlower suggestion acceptance rates and also may result in lower overalluser satisfaction, since it can lead a user to believe that the carrieris pushing content of the carrier's choice. The same reasons whynon-targeted advertising can be not only ineffective but actuallydetrimental are applicable for a non-targeted query completionsuggestion system.

In accordance with the description herein, user profiles are created byaccumulating all the search queries submitted by users in a population.The user population may comprise, for example, all the subscribers to anaccess (telecommunications) service, or the accumulated search queriesmay comprise all search queries submitted over a carrier's network overa specified time interval, or all search queries submitted by users in aparticular locale over a particular time. Other specifications foraccumulations of search queries and user populations for creating theuser profiles will occur to those skilled in the art. The collected usersearch queries are used to define user affinity, i.e. how similar aretwo users in their preferences and interests. Thus, user groups orclusters are defined over the search queries of the user population. Bycreating groups of users with similar preferences, a query completionsuggestion dictionary for each group can be created. All the usersbelonging in the same group (i.e., placed or assigned in the same groupor groups) may use the same group-specific dictionary rather than usingone dictionary created for the entire user population. A target user isa user for whom a personalized dictionary is to be generated, and thepersonalized dictionary for each target user can be generated inaccordance with the target user's group membership. If a target userbelongs to multiple groups, that target user's personalized dictionarymay comprise a combination of the group-specific dictionaries for thegroups to which the target user belongs.

Personalized query suggestion as described herein can greatly reduce thenumber of characters that must be typed before a searchable query inputis constituted. For example, it is anticipated that users can avoidtyping 30%-50% of the characters they would otherwise need to type witha non-personalized conventional search system. This can be a verydesirable reduction in input strokes that makes utilization of searchapplications much more attractive, especially in the context ofsearching with mobile devices such as Web-enabled cellular phones orWiFi devices.

2.0 Personalized Query Completion Suggestion

FIG. 1 shows a display page 100 of a mobile device with an example ofquery completion suggestion in accordance with an embodiment of theinvention. The illustrated display shows query completion suggestion asthe target user is typing a query. For example, in FIG. 1, the user hastyped the letter “m” as the first letter in a query and a drop-down boxappears with a number of possible completions. The suggested completionsshown in FIG. 1 include “Mariah Carey”, “Madonna”, “Modest Mouse”, and“Macy Gray”. Thus, query completion suggestions are provided to the userin real time as the user types a query input.

Suggesting query completions as the user is typing a query input is apopular feature in most Web search engines (see for examplehttp://labs.google.com/suggest). Suggesting query completions in mobilesearch, where input mechanisms are typically cumbersome and slow, is ofeven greater importance than in desktop search. Making query completionsuggestion as accurate as possible, thereby reducing the number ofcharacters a user must type before completing a query input, isespecially important in mobile search. In accordance with the presentinvention, one way to achieve more accurate query completions is throughpersonalization of search dictionaries. A personalized system,especially one that learns the preferences of a particular user overtime, should offer more accurate query completions than anon-personalized system.

For example, “mariah carey” is currently a very popular search queryamong mobile users overall, but there are many users that are notinterested in this artist. Seeing “mariah carey” as the query completionsuggestion every time the user types “m” or “ma” is not only lessaccurate for those users, but can also lead to lower user satisfaction.The user may be more likely to believe that the carrier is pushingcontent of the carrier's choice, is more likely to see the querycompletion suggestion as a promotion, and is therefore less likely tocontinue using the completion suggestion application. If, on the otherhand, the completion suggestions are targeted to the user, then the userwill likely view them less as a promotion and more as a convenientfeature that can help them find information they are looking for faster.

An important premise of personalized query completion suggestion is thatit enables different people to see different suggestions based on theirpreferences. For example, assume that there are two users, one who likespop music and celebrity culture, and the other whose interests aredisposed to classical music (both of these groups do exist, according toanalysis of the received query logs). Using the personalized querycompletion suggested described herein, when these two users start typingthe same characters for a search, they will see different results. Thetwo different completion suggestion results are illustrated in FIG. 2.

FIG. 2 shows that, in personalized query suggestion, different targetusers can be provided with different suggestions, depending on theirrespective preferences. In the left mobile device display 202, a targetuser has typed “m” as a search query input into an input window 206, andis shown a list 208 of suggested query completions for “m” comprising“Mariah Carey”, “Madonna”, “Modest Mouse”, and “Macy Gray”. These namesare produced with a personalized dictionary for a target user who is ina group categorized as interested in music of popular culture. The namesare all common in currently popular culture. In the right mobile devicedisplay 204, a different target user has typed “m” as a search queryinput into an input window 210, and is shown a list 212 of suggestedquery completions for “m” comprising “Mozart”, “Mahler”, “Mendelssohn”,and “Maria Callas”. These names are produced with a personalizeddictionary for a target user who is in a group categorized as interestedin classical music. The names are all associated with classical musiccomposers and performers. These two respective target users may select aquery completion from their respective choices 202, 204 and the selectedcompleted query can be submitted to a search facility. After searchprocessing, the search facility can return identified search results.

A more successful query completion suggestion system has the advantagethat fewer queries coming to a vendor's servers will need to be checkedfor spelling correction and normalization, improving system efficiencyand freeing up vendor resources for other tasks.

It should be noted that personalization is not mere memorization (whichis often called customization in other contexts). As will be shown inthe discussion below, simply memorizing past searches of a user is not asuccessful strategy. Rather, the full promise of personalization asdescribed herein lies in the fact that inferences can be made about thelatent preferences of a user based on his/her search history. Thepersonalized search dictionaries described herein are generated based ondata related to prior searches conducted by users in the same group,possibly including the very user for whom the dictionary ispersonalized. The personalized query completion suggestion system asdescribed herein can suggest query completions that comprise queriesthat have not been issued before by the user but that are stillpertinent to his/her interests.

2.1 Risks of Personalized Query Suggestion

Potential risks of personalized query completion suggestion are nodifferent than the potential risks of any personalization system. If thesuggestions are too narrow or too user-specific, even if they areaccurate, they may be perceived as intrusive and may raise privacyconcerns by the user. On the other hand, care must be taken not to“pigeonhole” a user and show inaccurate completion suggestions. Both ofthese concerns can be addressed, at least partially, with the techniquesdescribed further below. Also, with personalized query completionsuggestion, the Quality Assurance (QA) effort of the dictionary providerincreases, since now there will be multiple dictionaries rather than onefor all users.

Query completion suggestion with personalized dictionaries is mostuseful in an environment where there are many groups of users withdistinct preferences that remain substantially stable over time. Anideal environment for using a personalized dictionary in conjunctionwith a search facility is for a content store, such as the retail salesenvironment in which music compositions can be purchased, including fullsongs and ringtones for mobile devices (e.g. telephones). Otherenvironments, such as general purpose Web search, or where time and/orlocation of search are crucial factors, might be improved but generallymight not benefit as greatly.

Currently, the Google™ search service does not deploy personalized querycompletion suggestion on “Google Suggest”, citing privacy reasons (see,e.g., http://labs.google.com/suggestfaq.html at item 9) although itapparently has the ability to suggest queries based on one's searchhistory with a Google User Account and Web access (see the explanationat the URL of www.qooqle.com/historv/ then click on “Interesting Items”on the left side of the screen).

3.0 System Architecture

In the exemplary system described herein, a personalized querycompletion suggestion system is configured to operate with mobiledevices such as cellular telephones that are Web-enabled, PersonalDigital Assistants (PDAs), “smart phones”, and the like. An exemplarysystem is illustrated in FIG. 3.

FIG. 3 shows a mobile device communication system 300 that includesmobile devices 302, such as may include Web-enabled cellular telephones,smart phones, PDAs, and the like. The mobile devices communicate over anetwork 304, such as the mobile device carrier network, through whichthe mobile devices also gain access to Web content and the like. Anaccess service provider 306 provides network access to the mobiledevices and a search service provider 308 provides search services tousers of the mobile devices. The access service and search service maybe provided by separate vendors or by the same vendor, and theirrespective services may be provided by different facilities or the samefacility. Users of the mobile devices 302 comprise subscribers to theaccess service provider 306. A subscription or other access agreementwith the search service provider may be necessary. An example of anaccess service provider 306 is Verizon Wireless®, and an exemplarysearch service provider 308 is Medio Systems, the assignee of thepresent invention. The access service provider 306 and the searchservice provider 308 are adapted for network communications with themobile devices 302 and each other through respective networkcommunications interfaces 310, 312.

The access service provider 306 authorizes and manages access by themobile devices 302 to Web content from the Internet 304 over the serviceprovider's network. As such, the access provider can receive informationconcerning search queries submitted by each mobile device user and canassemble such information into data that relates to the collectivesearch activities of the users. A processor 314 of the access provideraccumulates the collective information in a data store 316. The accessprovider can provide such data to the search service provider 308, orcan arrange for the search service provider to gain access to such queryinformation directly from subscribers. The search service provider cancomprise an entity that receives submitted search queries over thenetwork 304, such as searches for Web content, and processes thesearches to locate the requested content and return the information tothe submitting user. The user who submitted the search query is thenfree to select the located content. Thus, the search provider 308 hassufficient processing, such as computers, network interface equipment,data storage, and the like, to receive search requests, process therequests, and return links to the located content.

The search service provider 308 analyzes the search query informationand produces personalized dictionaries. The search service providerdevelops categorizations (groups) based on the collected search queryinformation and other available information, such as subscriber and userdemographic information and generalized search query data, to producethe personalized dictionaries. The search service provider generates thepersonalized dictionaries using a processor 318. In the illustratedsystem 300, the personalized search dictionaries are installed at eachmobile device 302 such that the installed dictionary is personalized fora target user (e.g., a subscriber) who is associated with the device.That is, a “target user” is a user for whom a personalized dictionary isgenerated, and a target user is not necessarily a member of the userpopulation whose search queries were used to define the groups andgenerate the group dictionaries. Each personalized dictionary can beprovided from the search service provider 308 to the corresponding user302 over the network 304. For example, the personalized dictionaries canbe provided in the course of regular updates from the service provider,or from download requested by the user (subscriber), or otherprovider-subscriber communications, as desired.

FIG. 4 is a block diagram of a mobile device 400 such as the devices 302used in the FIG. 3 system. The mobile device 400 includes a networkcommunications interface 402 through which the mobile devicecommunicates with the network 304 (FIG. 3). A processor 404 controlsoperations of the mobile device. The processor comprises computerprocessing circuitry and is typically implemented as one or moreintegrated circuit chips and associated components. The mobile deviceincludes a memory 406, into which the personalized dictionary can bestored for use with a search application that is executed from themobile device. A user input component 408 is the mechanism through whicha user can provide controls and data. The user input component cancomprise, for example, a keyboard or numeric pad or other inputmechanism for providing user control and data input. A display 410provides visual (graphic) output display and an audio component 412provides audible output for the mobile device.

4.0 System Operation

FIG. 5 illustrates the operations performed in the FIG. 3 system forgenerating a personalized dictionary in accordance with the invention.The first operation, represented by the diagram box numbered 502, is toobtain raw user-query data profiles. The data profiles are typicallyderived from search query logs, such as obtained from the access serviceprovider. The information may be kept in a data warehouse or otherdepository, to which a search service provider or the like may gainaccess. The user-query profiles are created containing the raw searchqueries entered by a user population over a period of time. For example,the raw user-query profiles may contain information that indicates usersand corresponding search queries over a time period for each indicateduser in the user population. The user-query profile information may beupdated on a regular basis or otherwise kept current for each user.

The next operation, indicated by box 504, is to apply spellingcorrection. That is, the raw queries can be passed through a spellingcorrection system. Spelling correction can more efficiently manage thedata, by eliminating spurious duplicate entries. If no spelling rulesexist in the system, the spelling rules can be automatically generatedfrom the raw query logs and spelling dictionaries. If desired, thesystem can store incorrect spellings, such that correction indexes canbe improved for spelling correction.

The next operation is to tokenize the queries (box 506). In thisoperation, search query terms are combined or parsed into tokens andtoken sequences comprising letters and identifiable words. That is,after spelling correction, the processed search queries comprise acollection of words that can be useful for query completion suggestion,but it is desirable to offer completion suggestions that comprisecomplete phrases, not just single words. For example, a query of“britney spears” comprises the words “britney” and “spears”, but thesetwo words will be merged into the single token of “britney spears” bythe tokenization process of block 506. Other tokens that can besubstituted for individual words could include movie titles, names ofmusical groups, song titles, business names, place names, and the like.Additional details regarding exemplary tokenization processing areprovided below.

After tokenizing the user-query information, the next operation (box508) is to create user-token profiles. In this operation, the tokenizedqueries are substituted into the query history to obtain new profiles.That is, in the raw user-query profiles that contain search queries forcorresponding users, this operation replaces the search queries with thetokenized search query information from box 506.

At the next operation of box 510, one or more clusters or groups aredefined, and a target user can be placed in one or more of the groups asdescribed further below, on the basis of preferences as indicated byprocessing the search queries (that is, the tokenized search queries).The search information can also be supplemented with additionalinformation about users. For example, clustering a user population intogroups can be based on search queries in conjunction with demographicinformation obtained from carriers or from self-reporting by persons ina user population or by target users, or a combination. The granularityof the clustering can be selected in accordance with interests gleanedfrom the search query information and in consideration of systemresources and the capabilities of the mobile devices intended forapplication. Conventional techniques can be used for identifyingclusters from among the query information. For example, variousthreshold levels may be used to determine when the frequency of searchqueries directed to particular terms warrants a new cluster for thecorresponding subject category. In addition, devices with relativelylimited resources and processing power might be limited to personalizeddictionaries of restricted size or the group memberships associated withthe device might be limited. Further details regarding clustering areprovided below. The clustering identifies groups of users with similarpreferences.

At box 512, a dictionary is created for each identified cluster, or usergroup. The tokenized query information of all the users whose searchqueries contributed to a cluster are used to create one dictionary percluster. Those skilled in the art will be familiar with creatingdictionaries based on a set of search query information. In accordancewith the present invention, a group of such dictionaries is created, oneper cluster, based on search query information of each identifiedcluster. A target user who does not have a sufficient prior query searchhistory to generate a reliable personalized search dictionary, such as anew user, can be placed into one of the identified clusters on the basisof self-selection, or account information, or a default group, or thelike. After the target user receives the personalized dictionary andsubmits searches, thereby building a search log, that target user cancontribute to the search log data and contribute to future processesthat define the clusters and produce the cluster dictionaries.

At the last operation 514, each target user will be provided with thepersonalized dictionary for the cluster or group in which the user iscategorized. If the target user is a member of more than one group, asdescribed further below, the target user can be provided with apersonalized dictionary that is a combination of the dictionaries foreach cluster or group in which the target user is categorized. Thepersonalized dictionary can be provided to the target user throughnetwork access. For example, the user can download the personalizeddictionary in accordance with update procedures of the carrier throughwhich the user obtains network access. The personalized dictionary isinstalled into the target user's mobile device and is automaticallyemployed by the mobile device when the user makes use of a searchapplication to input a query. No change is necessary to the operation ofthe mobile device for utilizing the personalized dictionary produced inaccordance with the invention.

To the user of the mobile device in the system, there is no change tothe device operation necessary to utilize the personalized dictionary asdescribed herein. Therefore, when a user begins typing a query into asearch query input window for a search application of the mobile device,the search application will consult the personalized search dictionarystored in the device memory and will generate one or more suggestionsfor completion or modification of the search query based on thepersonalized dictionary as the user inputs the query. The suggestionscan be shown in a drop-down list of the application display, such asillustrated in FIG. 1 and FIG. 2. The user can then designate one of thecompletion suggestions to use as a completion of the user query beinginput to the search application. The query thus modified is thensubmitted to the search query facility.

As noted above, it is possible for a user to belong to multipleclusters, or groups. For a user who has sufficient search log data, thecategorization of that user as between multiple group memberships willdepend on whether the frequency of individual search terms input by theuser indicates that the user has interests that correspond to more thanone group. When multiple groups are indicated, the combination ofgroup-specific search dictionaries can be combined to generate apersonalized search dictionary by applying different weights to thegroup-specific dictionaries, in accordance with the user's search queryinformation. That is, in the case of multiple groups, a target user canbe provided with a weighted dictionary according to that user'sparticular, individual search query information. For each groupdictionary, tokens are sorted according to their frequency in thatcluster (group). If a target user belongs to multiple clusters orgroups, then a personalized dictionary can be generated by weighting thefrequency of each token in a cluster with the membership of the targetuser to that cluster, and adding the weights over all clusters. Thus,each token will be associated with a weight, according to the targetuser's group memberships.

For example, in one scenario, it may be that four group dictionarieshave been generated from search query data. The exemplary groupdictionaries are provided below in Tables 1, 2, 3, and 4, comprisingrespective categorizations called Cluster 1, Cluster 2, Cluster 3, andCluster 4. Each of the tables shows two columns, one for search queryterms and the other with corresponding frequency counts in the searchquery data for the listed terms:

TABLE 1 Cluster 1 Count family guy 8837 south park 3901 games 1649simpsons 1084 star wars 999 ringtones 981 eminem 907 tones 902

TABLE 2 Cluster 2 Count bob marley 4870 sublime 3892 jay z 2080 nas 1706eminem 1626 ringtones 1609 akon 1606 snoop dogg 1471

TABLE 3 Cluster 3 Count disney 5841 wallpaper 1192 tones 1097 games 872ringtones 833 pix 721 pirates 619 mickey mouse 468

TABLE 4 Cluster 4 Count country 10147 tim mcgraw 5493 kenny chesney 5106rascal flats 4416 tones 4138 ringtones 3980 nickelback 3889 keith urban3145

Those skilled in the art will understand probabilistic and statisticalmethods that may be used to derive group memberships for individualusers of a user population. For example, well-known techniques usingmixture-of-multinomials methods and the like may be used. Such methodscan determine that, for example, User A has a 0.8 likelihood measure formembership in Cluster 1 and 0.2 for membership in Cluster 3. In thisexample, User A has zero membership in Cluster 2 and Cluster 4. Withthese example values, the personalized dictionary for User A will have acount frequency given by

count=(0.8*count in Cluster 1)+(0 for Cluster 2)+(0.2*count in Cluster3)+(0 for Cluster 4),

so that the count frequency in the User A personalized dictionary forthe term “games” will be (0.8*1649)+(0.2*872)=1493.6.

After the personalized dictionary is generated and installed at themobile device of the target user, every time the target user startstyping a search query, the top query completion suggestion in the listpresented to the user will be the entry in the personalized dictionarythat starts with the same characters as typed by the user and has thehighest frequency count. The second completion suggestion in the listwill be the entry in the personalized dictionary that starts with thesame characters as typed and has the second highest frequency count, andso on. The completion suggestion list will change as the target usertypes an input query, in accordance with matching the letters typed thusfar and ordering according to the frequency count for terms in thepersonalized dictionary.

To provide the personalized dictionary for User A, the Cluster 1 andCluster 3 dictionaries can be combined so as to provide an integrateddictionary with the computation above. For example, in the User Apersonalized dictionary, the term “games” will have a correspondingentry in the frequency column of 1493 (integer value). Thus, generatingthe personalized dictionary in the case of multiple group membershipswill involve some computation of frequency count data from theconstituent group dictionaries. The personalized dictionary will bestored at the User A device in place of the generic dictionary thatwould otherwise be stored in the User A mobile device. As noted above,the operation of the user's search application with the personalizeddictionary will be transparent to the user.

5.0 Creating User Profiles

An exemplary description of producing personalized dictionaries inaccordance with the invention is described below. The personalizeddictionaries can be completed using collective search query informationfrom a population of subscribers.

The first operation for creating groups of users with similarpreferences is to define a profile for each user. Since we are seekingto generate query completion suggestions for search queries, logs ofcompleted search queries can be used to define or identify user profilesfrom the population of users. The data from the population can beprocessed so as to use the most likely valid search queries as the basisfor identifying groups and fashioning dictionaries. For example, allusers with five or more different valid queries can be selected forfurther processing, and search queries from users with fewer queries canbe deleted from the data to be further processed. In order for aparticular query to be considered valid, it must be issued (i.e.,submitted) by at least ten other users and must not be comprised solelyof punctuation marks. Other “qualifying” measures can be used to includeor exclude data and provide useful dictionaries, in accordance withsystem requirements.

The query logs from which the groups are identified and dictionaries aregenerated can be taken from actual search support system data, such asfrom search systems available from Medio Systems for telecommunicationsproviders. In general, the search data will be collected over a selectedtime period from a selected telecommunication service (e.g. cellulartelephone service) provider. The choice of a threshold of five queriesper user for inclusion in the data, as described above, is an arbitraryvalue that should be sufficient for satisfactory results for mostsystems. Nevertheless, it should be noted that, in general, care must betaken to ensure that users selected for inclusion in the search querypopulation have sufficient user history (i.e., a sufficient number ofdifferent valid queries) for group definition and personalization to beeffective.

A user profile contains all the queries the user has issued orsubmitted, along with a count of the number of times each query has beenissued by that user. For example, a profile can be similar to thefollowing:

-   -   3163905 “games” 2 “sean paul” 3 “free games” 3 “free” 1 “dance        dance revolution” 3

The user profile immediately above can be interpreted as: user 3163905has queried two times for “games”, three times for “sean paul”, threetimes for “free games”, once for “free”, and three times for “dancedance revolution”.

5.1 Spelling Correction

After the user profiles have been defined, based on search query datafrom users who meet the minimum threshold of five valid queries, theprofiles can contain a significant amount of spelling errors. Spellingerrors can be viewed as noise, masking the true preferences andintentions of users. In addition, when dictionaries are created, nomisspelled words should appear as suggestions. Therefore, the userprofiles should be checked and any spelling errors should be corrected.The spelling correction process can comprise, for example, the spellingcorrection system provided by vendors such as Medio Systems. Suchspelling correction systems often closely follow the techniquesdescribed in S. Cucerzan and E. Brill, “Spelling correction as aniterative process that exploits the collective knowledge of web users”cited above, and which is incorporated herein.

To briefly explain the spelling correction process, tokens and tokenpairs are extracted from the queries of the user profiles. A token isdefined as a string of one or more characters (including punctuation)separated from other tokens by a whitespace or a tab. In one embodiment,the Levenshtein string edit distance is calculated between every pair ofextracted tokens/token pairs. String edit distances are calculatedbetween every token, every token pair, and between every single tokenand token pair. The string edit distances can be normalized, so thatnormalized string edit distance is defined as the total number ofinsertion, addition, and deletion operations needed to transform onestring into a reference string, divided by the number of words in thereference string, where the reference string is defined to be the stringthat is more frequent, according to the query search logs. If the stringedit distance is lower than 0.2, then the pair should be retained in theuser profile.

String edit distance cannot tell which token is the reference, so thestring with the highest number of occurrences is assumed to be thereference. Only tokens that are not part of a predefined list areallowed to be corrected. The spelling mapping errors are chainedtogether to recognize transitive relations, so if “A maps to B” and “Bmaps to C”, only the “A maps to C” rule is kept. For a comparison ofthis approach to other spelling correction schemes, see the Cucerzan andBrill document referenced above. After the spelling rules are created,they can be applied to every query in the user profiles. It can beexpected that a large fraction of spelling mistakes can be corrected.For example, approximately one-fourth of unique queries and 5%-10% oftotal queries can be expected to be corrected for misspellings.

5.2 Tokenizing Queries

After the spelling correction process is completed, it is not unusual tohave a relatively high number of queries in relation to the number ofusers. For example, the number of unique queries can be one-half thenumber of total users in the search submitting population. Many carriershave hundreds of thousands of users. Query data of this volume canhamper effective statistical modeling. In addition, many queries canexpress similar intentions. For example, “britney spears”, “rt britneyspears”, and “britney spears ringtones” can all express very similarintentions. To reduce the high dimensionality of the user query space,the queries can be broken up into constituent tokens and correspondingtokenized queries can be generated.

Since retaining names and named entities in the data can improve thegenerated dictionaries, trigger pairs of size 2, 3, and 4 can beconstructed. A trigger pair is a pair of tokens, where the presence ofone causes (or triggers) with high probability the presence of theother. For example “Christina Aguilera” is a trigger pair since when“Aguilera” is present in a query, the data indicated that the previousword is almost always “Christina”. Or “Don Omar” is another trigger pairsince when “Don” is present, “Omar” is most likely the next word.Trigger pairs can have any length we choose. In the exemplary system,the maximum trigger pair length is set to four.

For a sequence of tokens to be considered a trigger pair, two conditionsare generally required: (a) the support or the probability of one tokengiven the others must exceed 0.8, and (b) the total number ofoccurrences for the sequence must be greater than 50. Some examples oflikely trigger pairs in currently available data are: “lord of therings”, “Winnie the pooh”, “somewhere over the rainbow”, and the like.In the context of music, artist and band names are typical triggerpairs, along with popular song titles. Using the tokenization approachdescribed herein, queries that contain the same tokens but in differentorder will not be treated as distinct entities. That is, queries thathave the same tokens but in different order will be treated as the samequery. For example, the queries “ringtones britney spears” and “britneyspears ringtones” will be treated as identical. Also, by using tokensinstead of words, “britney spears” will be a single token, rather thantwo tokens. This leads to better user-to-user affinity estimates Thetokenization process can drastically reduce the number of terms in thesearch query data, on the order of a reduction to 10% to 20% of thetokens after tokenization as compared to before tokenization.

5.3 Create User-Token Profiles

After the tokenized queries are generated, the next operation inproducing a personalized dictionary is to create user-token profiles.The user-token profiles are created by substituting a tokenized queryinto the user profile collection of submitted queries. That is, atokenized query is substituted for a raw query, thereby reducing thenumber of tokens or different queries in the user space.

6.0 Creating Group-Specific Dictionaries

6.1 Clustering Users Into Groups

After user-token profiles are created, the next step is to cluster theusers into substantially homogeneous groups. Determining the number ofgroups and the data composition of the groups will depend on systemdesign priorities and user profile data characteristics, such ascontinuous or discrete, binary or mixed, and the like. Some alternativetechniques include agglomerative clustering and k-means clustering. Forthe exemplary system described herein, a mixture of multinomialdistributions with appropriate smoothing is fitted on the user profiledata. Other considerations will be known to those skilled in the art.

The modeling assumption with the multinomial processing is that eachhomogeneous group of users corresponds to a different multinomialdistribution. This model is fit to the data using theExpectation-Maximization framework known to those skilled in the art.The outcome of this process is a vector for each user containing thefractional memberships for each group or cluster. The model permits auser to fractionally belong to multiple clusters. Details of theclustering method will be known to those skilled in the art. It shouldbe apparent that people have diverse interests, and aggregating allqueries into a single dictionary would ignore this fact. In accordancewith the invention, the diverse interests of a search query populationis recognized and reflected in the group-specific dictionaries.

6.2 Creating a Dictionary for Each Cluster

The next operation is to create a dictionary for each cluster. Theprofiles from all the users belonging to the same cluster are aggregated(if a user fractionally belongs to a cluster, the profile is firstmultiplied by that fraction) and the frequency of each token occurringin the queries belonging to a cluster is estimated. In this way, thefrequency of each query in a cluster is estimated. If desired, thetokens can be edited to eliminate unwanted terms, such as offensiveterms or prohibited terms. In addition, terms or tokens for which thesearch application will not likely return any meaningful results can beremoved, such as the names of Web domains. Dictionaries should return aresult for every suggested query completion. Therefore, terms that arenot likely to provide a valid result should be removed before thedictionaries are generated. Also, tokens with less than five occurrencesper cluster can be removed, and this will also likely remove asignificant part of tokens that do not return any meaningful result.Thus, a portion of the collective search query information can beexcluded from the analysis that includes tokenizing.

To control the number of tokens for each dictionary, a pruning processis applied. Each token is ranked according to its number of occurrencesin that cluster and the top N tokens are retained. The value of N can beselected in accordance with system resources and the intendedapplication (e.g., mobile device capabilities).

FIG. 6 is a block diagram of processing components 600 in a computer ofa search service provider that generates personalized dictionaries inaccordance with the disclosure herein. The components may be implementedin software or a combination of software and hardware (e.g. firmware)for operation by the processor 318 of the search service provider 308illustrated in FIG. 3.

The processing components include a User Profile Creating component 602.This component performs the operations described for creating userprofiles, such as the operations of the box 502 of FIG. 5. The UserProfile Creating component performs the analysis of collective searchquery information at predetermined intervals on updated collectivesearch query information. Also included is a Query Processing and UserToken Profile component 604. This component performs query informationprocessing, such as spelling correction, query tokenizing, and creationof user-token profiles (see, e.g., box 504, 506, 508 of FIG. 5 andcorresponding description above). The processing components also includea Search Dictionary component 606 that clusters the users into groups,produces a dictionary for each cluster or group, and generates thepersonalized dictionary that a target user should receive. Theprocessing of this component 606 corresponds to the operations of box510, 512, and 514 of FIG. 5.

7.0 Prediction Issues

Evaluation in a production system can be realized through monitoring thesuggestion acceptance rates. Each query contains a bit indicating if itcame from a suggestion or not. To properly validate that personalizedquery suggestion is better than non-personalized, we should apply an A/Btest where a randomly selected user group receives a non-personalizeddictionary and another randomly selected group uses personalizeddictionaries. The suggestion acceptance rate of the second group shouldbe higher. It should be noted that in both cases equal size dictionariesmust be used.

Alternative embodiments can be provided for the personalized querycompletion suggestion technique described thus far. The main task is theability of the disclosed system to deliver different dictionaries todifferent users. This is a server-side functionality and should be aprerequisite for any personalization effort. No changes should need tobe made to the client side operation for successful operation. Once thedictionary is loaded onto a rich client, it will be processed the exactsame way as before. Also, no changes need to be made to the indexingprocess since this is not a personalized search feature.

In many cases, there likely will be some effort required from theQuality Assurance (QA) department for successful implementation, sincethe dictionaries need to provide confidence that: (a) There are noadult, offensive or otherwise inappropriate terms in any of thedictionaries, (b) There are no completion suggestions in any of thedictionaries that do not return a result. Such QA assurances can beprovided by a combination of excluding a portion of the collective queryinformation from analysis (such as excluding offensive and inappropriatesearch terms) and processing the generated personalized dictionariesbefore delivery (such as eliminating completion suggestions).

Also, dictionaries should be updated regularly (perhaps once a week) andso should the user memberships in the various clusters. This process canbe automated for greatest efficiency. Thus, the search provider canregularly revise membership in the cluster groups and can regularlyregenerate the personalized dictionaries based on the new collectedsearch query data being received. The updates can incorporate new datafrom search query logs to include new search query data and new users.Thus, a target user can contribute to the search query logs from whichuser profiles are created and group dictionaries are produced. In thisway, the system described herein can learn the preferences of the targetuser and provide more useful query completion suggestions.

8.0 Conclusions

Helping users find what they want faster, especially on a mobile device,is crucial for improved user satisfaction. A proof-of-conceptpersonalized query completion suggestion system was detailed andquantitative results were shown. The results are very encouraging andshow that using a personalized query completion suggestion system canresult in significant savings in typing time compared to anon-personalized system. An important point to take is that thesuggested personalized query completion suggestion system is not a merememorization system. Simply memorizing the previous searches of a userand applying these for query completion suggestion usually results inpoor savings. The key component is to place each user into a largercontext, but still narrow enough that it is useful, and this is whatclustering in accordance with the invention does. It should be notedthat clustering is not the only choice of statistical machinery that canachieve personalization. Other methods and techniques can also be used,such as factor analysis, but the improvement between other statisticalmethods and cluster-based personalization is not likely to be moresignificant than the improvement between using cluster-basedpersonalization and no personalization.

The same principle behind the personalized query completion suggestionsystem described herein can be applied to other areas of the searchexperience, namely personalized voice search, personalizedrecommendations, and generating personalized item popularity.

Thus, a personalized search dictionary is a search dictionary generatedfor a particular target user, according to a user profile for that user.The user profile may be created by analysis of search queries by thattarget person in relation to a user population, or the user profile maybe created by assigning the target user to a group, or to one or moregroups in a collection of groups, based on demographics, self-reporting,self-selection, questionnaire, and so forth. That is, the target user isthe user for whom the personalized dictionary is to be generated. Foreach defined group, a corresponding group search dictionary is produced.The group search dictionaries are produced in accordance with searchqueries submitted by users in a user population. A target user may ormay not be a member of the user population from which the personalizeddictionary is generated. In fact, at initial sign-up, the target user ismost likely not a member of the search query user population, becausethe target user is new (to the system) and does not likely havesufficient search queries to contribute to the search query data. Thetarget user is not necessarily a person, but is associated with a useraccount maintained by the service provider and through whom (or throughwhich) the personalized search dictionary is provided and the searchqueries are submitted.

The dictionary generating technique described herein can provide apersonalized dictionary that is tailored for a target user so that thepersonalized dictionary includes terms that have greater relevance tothe target user's likely search queries and can therefore likely providea more useful query completion suggestion. Because the personalizeddictionary is tailored for the target user, terms that are not likelyrelevant to the user can be deleted or excluded from the personalizeddictionary, producing a more user-relevant dictionary that can provide auser experience comparable to that of much larger general-purposecompletion dictionaries that are shared across larger user populations.The personalized dictionary described herein can be reduced in size ascompared with the size of a comparable general-purpose dictionary. Thereduced size makes it more likely that the personalized dictionary canbe accommodated in storage of devices having relatively modestresources. Thus, the personalized dictionary can be stored in memory ofa mobile device so that it can be readily accessed by a searchapplication without the necessity of communicating with ageneral-purpose dictionary that is stored at a remote network location.Once it is available to the mobile device user, the personalizeddictionary can be coupled to a search application of the mobile deviceto provide query completion suggestions, as described herein.

As described herein, in accordance with the personalized querycompletion suggestion and personalized lexicon, the query completionsuggestion is more likely to be what was intended by the user and willmore likely be welcomed and trusted by the user. This is especiallyuseful in the mobile device context, where speed of input and efficiencyof operation are highly prized. The technique has application to a widevariety of devices, and is especially suited to mobile devices such assmart cell telephones with mobile Web capability and other mobileWeb-enabled devices such as WiFi-capable devices. The technique hasapplication to a variety of input mechanisms, such as alphanumeric inputfrom conventional keyboards and also voice input.

The present invention has been described above in terms of a presentlypreferred embodiment so that an understanding of the present inventioncan be conveyed. There are, however, many configurations for mobileenterprise data systems not specifically described herein but with whichthe present invention is applicable. The present invention shouldtherefore not be seen as limited to the particular embodiments describedherein, but rather, it should be understood that the present inventionhas wide applicability with respect to mobile enterprise data systemsgenerally. All modifications, variations, or equivalent arrangements andimplementations that are within the scope of the attached claims shouldtherefore be considered within the scope of the invention.

1. A computer implemented method of search query processing, the methodcomprising: receiving a search query input from a user of a mobiledevice; comparing the search query input to a personalized searchdictionary and determining a suggested completion for each match in thecomparison; providing the suggested completion to the user forselection.
 2. The method as in claim 1, wherein the suggested completioncomprises one or more query strings, each of which provides a completedquery input string if selected by the target user.
 3. The method as inclaim 2, wherein the query strings comprise multiple alternativecompleted query input strings.
 4. The method as in claim 1, wherein thepersonalized search dictionary is generated from analysis of collectivesearch query information that includes data related to search queriessubmitted by a predetermined user population over a period of time. 5.The method as in claim 4, wherein the collective search queryinformation includes data related to search queries submitted by thetarget user over a period of time.
 6. The method as in claim 4, whereinthe target user is not a member of the predetermined user population. 7.The method as in claim 4, wherein the analysis of collective searchquery information comprises: identifying a plurality of previouslysubmitted search queries of a population of search users; determininggroups of the search users based on the previously submitted searchqueries; generating a group search dictionary for each one of thedetermined groups.
 8. The method as in claim 7, wherein the target useris placed into one or more of the determined groups, and thepersonalized search dictionary is generated according to the usermembership to each determined group into which the target user isplaced.
 9. The method as in claim 8, wherein the personalized searchdictionary is installed in the mobile device.
 10. The method as in claim4, wherein the analysis of collective search query information isrepeated at predetermined intervals on updated collective search queryinformation.
 11. The method as in claim 1, wherein the personalizedsearch dictionary is generated from among a predetermined set of groupsearch dictionaries.
 12. The method as in claim 1, wherein thepersonalized search dictionary is generated from analysis of collectivesearch query information such that a portion of the collective searchquery information is excluded from the analysis.
 13. The method as inclaim 1, wherein a selected suggested completion provides a completedquery input string, which is provided to a search service provider forprocessing.
 14. A method as in claim 1, wherein the input string is asingle character.
 15. A method as in claim 1, wherein the input stringis a plurality of characters.
 16. A method as in claim 1, wherein thesearch query input is received from a keyboard of the mobile device. 17.A computer system for generating a search dictionary, the systemcomprising: a user profile creating component that performs analysis ofcollective search query information that includes data related to searchqueries submitted by a user population over a period of time and createsuser profiles; a query processing and user token profile component thatprocesses the user profiles and performs query information processing,including spelling correction, query tokenizing, and creation ofuser-token profiles; a search dictionary component that identifies oneor more groups into which the user profiles are placed and generates asearch dictionary for each identified group.
 18. The system as in claim17, wherein the search dictionary component assigns a target user intoone or more of the identified groups and generates a personalized searchdictionary in accordance with the assigned groups and searchdictionaries for the target user.
 19. The system as in claim 18, whereinthe personalized search dictionary is adapted for installation in amobile device.
 20. The system as in claim 18, wherein the searchdictionary component generates the personalized search dictionary fromanalysis of collective search query information that includes datarelated to search queries submitted by a predetermined user populationover a period of time.
 21. The system as in claim 20, wherein thecollective search query information includes data related to searchqueries submitted by the user over a period of time.
 22. The system asin claim 20, wherein the target user is not a member of thepredetermined user population.
 23. The system as in claim 20, whereinthe user profile creating component performs the analysis of collectivesearch query information by identifying a plurality of previouslysubmitted search queries of a population of search users, determininggroups of the search users based on the previously submitted searchqueries, and generating a group search dictionary for each one of thedetermined groups.
 24. The system as in claim 23, wherein the targetuser is placed into one or more of the determined groups, and thepersonalized search dictionary is generated according to the usermembership to each determined group into which the target user isplaced.
 25. The system as in claim 18, wherein the analysis ofcollective search query information is repeated at predeterminedintervals on updated collective search query information.
 26. The systemas in claim 18, wherein the user profile creating component performs theanalysis of collective search query information at predeterminedintervals on updated collective search query information.
 27. The systemas in claim 18, wherein the query processing and user token profilecomponent excludes a portion of the collective search query informationfrom the query information processing.
 28. The system as in claim 18,wherein the query processing and user token profile component performstokenizing by identifying trigger pairs in the user profiles.
 29. Thesystem as in claim 18, wherein the personalized search dictionarycomponent determines the personalized search dictionary that the targetuser should receive.
 30. The system as in claim 18, wherein thepersonalized search dictionary is generated from among a predeterminedset of group search dictionaries.
 31. A method of generating a searchdictionary for a computing device, the method comprising: creating userprofiles from analysis of user query information collected from searchqueries submitted by a user population over a period of time; tokenizingthe user profiles and creating user-token profiles; defining one or moregroups based on the user-token profiles; creating a group-specificsearch dictionary for each defined group.
 32. The method as in claim 31,further including: assigning a target user to one or more of the definedgroups; providing the target user with a personalized search dictionaryin accordance with the one or more defined groups to which the targetuser is assigned.
 33. The method as in claim 31, wherein the target useris a member of the user population.
 34. The method as in claim 31,wherein a portion of the collective search query information is excludedfrom the tokenizing.
 35. The method as in claim 31, wherein tokenizingincludes identifying trigger pairs in the user profiles.
 36. The methodas in claim 31, wherein the personalized search dictionary is generatedfrom analysis of collective search query information that includes datarelated to search queries submitted by a predetermined user populationover a period of time.
 37. The method as in claim 36, wherein thecollective search query information includes data related to searchqueries submitted by the target user over a period of time.
 38. Themethod as in claim 36, wherein the target user is not a member of thepredetermined user population.
 39. The method as in claim 36, whereinthe analysis of collective search query information comprises:identifying a plurality of previously submitted search queries of apopulation of search users; determining groups of the search users basedon the previously submitted search queries; generating a group searchdictionary for each one of the determined groups.
 40. The method as inclaim 39, wherein the target user is placed into one or more of thedetermined groups, and the personalized search dictionary is generatedaccording to the user membership to each determined group into which thetarget user is placed.
 41. The method as in claim 40, wherein thepersonalized search dictionary is adapted for installation in a mobiledevice.
 42. The method as in claim 36, wherein the analysis ofcollective search query information is repeated at predeterminedintervals on updated collective search query information.